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Abstract 

We consider in-network computation of an arbitrary function over an arbitrary communication network. A 
| network with capacity constraints on the links is given. Some nodes in the network generate data, e.g., like sensor 

nodes in a sensor network. An arbitrary function of this distributed data is to be obtained at a terminal node. The 
structure of the function is described by a given computation schema, which in turn is represented by a directed 
J/3 ■ tree. We design computing and communicating schemes to obtain the function at the terminal at the maximum rate. 

For this, we formulate linear programs to determine network flows that maximize the computation rate. We then 
• develop fast combinatorial primal-dual algorithm to obtain e-approximate solutions to these linear programs. We 

then briefly describe extensions of our techniques to the cases of multiple terminals wanting different functions, 
multiple computation schemas for a function, computation with a given desired precision, and to networks with 
energy constraints at nodes. 
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I. Introduction 



Motivated by sensor network applications, there has been significant interest in computing functions 
of distributed data inside the network. A typical scenario that is considered is as follows. Sensor nodes, 
distributed in a sensor field, can make measurements of their environment, perform reasonable amounts 
O ■ of computation and also communicate with other nodes. The interest of the sensor network is not so 
^ ! much in the measurement values made by the sensors but of some function of these variables, say 0. 
^ \ Since the nodes in the network can perform computation, they could participate in the computation of 0. 
q ■ Thus the interest is in distributed computation of a function of distributed data. This has also been called 
t-h . 'in-network function computation.' In this setting, it is typically assumed that the variables form a time 
>■ ! sequence and that they can be generated at any rate; equivalently, an infinite sequence is readily available. 
Thus, in this setting it is natural to want to compute at the best rate possible. In this paper, we introduce 



>< 

5_1 1 novel network flow techniques to design a computation and communication scheme that maximizes the 
. 5^ , rate at which is computed. Though network flow techniques have been used widely to study multiple 
unicast [DDl-flU problems, our work develops such techniques for the first time for function computation. 

Early work on in-network computation was on the asymptotic analysis of the number of transmissions 
needed to compute specific functions in noisy broadcast networks, e.g., flSB— CJ. In recent works, it 
is assumed that the node locations are from a realization of a suitable random point process, hence 
the resulting communication graph of the network is a random graph, e.g., 1^- lfTTTl . In this setting a 
probabilistic characterization of the asymptotic (in the number of nodes) computation rate for different 
classes of functions, such as 'type-threshold functions' and 'type sensitive functions' (S), have been 
obtained. 

Another class of work considers simplistic networks with small number of correlated sources |[T2ll - [[T5l . 
Much of this work takes the information theoretic perspective in which the objective is to find encoding 
rate regions for reliably communicating the desired function. This class of work allows block coding to 
achieve better rates. There has been some recent work in the network coding literature on distributed 
function computation [fT6l - ffT8l . They consider larger and more complex networks with independent 
sources. However, designing optimal coding schemes and finding capacity is a difficult problem except 
for very special functions or networks lfT6ll . ifTTll . 



2 




(a) (b) (c) (d) 

Fig. 1 . Computing O = X\ X2 + X3 over a network, (a) A network to compute O = X\ X2 + X3 . (b) A possible embedding that computes 
at Q at unit rate, (c) An alternative embedding, (d) A schema to compute O. 

In this paper we make a significant departure from the above. We consider arbitrary functions of the 
distributed data for which a computation schema is described by a directed tree. A computation schema 
defines a sequence of operations to compute the function. An arbitrary communication network over which 
is to be computed is assumed given. Our techniques work for networks with both directed as well as 
undirected links with capacity constraints, though we present our results only for networks with undirected 
links. There are some similarities of our work with that of graph embedding, e.g., |[T9ll - ll2TI but there are 
significant differences in the modeling assumptions and in the embedding objectives. Such work typically 
assume the target network to be a 'regular network' like a hypercube or a mesh and all link capacities 
are assumed equal. The embedding objective is to minimize the parameters like 'dialation.' 



A. An Example and Motivation 

Let us consider the function Q(X%, X 2 , X 3 ) = X\X 2 + X 3 of three variables generated at three sources 
Si, s 2 , and s 3 respectively. A terminal node t is required to obtain the function Q(Xi, X 2 , X 3 ). We assume 
that all the three data symbols are from the same alphabet A. The computation of the function can be 
broken into two parts, namely, first computing X1X2, and then adding X 3 . These two operations can be 
done at different nodes in the network in the above order. This decomposition of the computation can 
be represented by the graph shown in Fig. |l(d)| Such a graphical representation of the computation will 
henceforth be called a computation tree. Each edge represents a unique function of the source symbols. 

Now consider computing Q(Xi, X 2 , X 3 ) in the network shown in Fig. |l(a)| where each edge has 
unit capacity. There are multiple ways of receiving this function at the terminal t depending on what 
computations are done at what nodes and along what paths the data flows. Two such ways of computing 
this function are shown in Figs. 1 1 (b)] and 1 1 (c)| These are called 'embeddings', defined formally in Sec. HD 
It is clear that intelligent time-sharing between these various embeddings may give higher number of 
computation per use of the network on average than using only one such embedding. This raises the 
natural question: what is the maximum rate of computing that can be achieved on a given network and 
how to achieve it? 



B. Organization and Summary of Contributions 

We begin by describing the model in detail in the next section. Section [In]presents the main contributions 
of this paper. Here we formulate a linear program, Embedding-Edge-LP, that optimally allocates flows 
on the embeddings. We then present another LP, Node-Arc-LP,, based on a flow conservation law. This 
LP can be solved in polynomial time. We then describe an algorithm, Algorithm \T} that converts the 
flow rates obtained from Node-Arc-LP into a flow allocation on the embeddings. We then present a fast 
primal-dual algorithm which finds a solution to achieve at least (1 — e) fraction of the optimal rate. We 
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call such a solution an e- approximate solution. This algorithm uses an oracle subroutine which finds a 
minimum cost embedding of the computation tree in the network. We provide an efficient algorithm, 
OptimalEmbedding(L), to obtain the same. This algorithm is also of independent interest. Four interesting 
extensions of our results are presented in Sec.|IVl First, we allow multiple computing schema for computing 
the same function. Then we consider multiple terminals computing distinct functions of disjoint sets of 
sources. For this problem, we modify our techniques to maximize the weighted sum-rate of computations, 
and also to maximize the rate-tuple in a given direction. In the third extension, we consider the problem of 
computing a function with a desired precision which is achieved by allowing possibly different precision 
for each type of data. In the fourth extension, we consider a network with energy-constrained nodes, and 
assume that each type of data, i.e., each edge of the computation tree, requires some fixed but different 
amount of energy to compute/generate, transmit, and receive. 

II. The model and the notation 

The communication network is an undirected, simple, connected graph J\f = (V, E) where V is a set 
of n nodes and E is a set of m undirected edges. Each edge uv E E represents a half duplex link with a 
total non-negative capacity ciuv). In the network, S = {s\, s 2 , . . . , s K } C V is the set of k source nodes. 
Source has an infinite sequence of data values {Xj(/c)} fc > where Xi(k) belongs to a finite alphabet 
A. The link capacities are expressed in |^4|-ary unit. Xi is used to denote a representative element of 

the sequence. Let X — [Xi, . . -X K ]. Without loss of generality, we assume that each source node in the 
network generates exactly one data sequence. If a source node generates two or more data sequences then 
this can be represented by multiple source nodes connected by infinite capacity links. We also assume 
that there is only one terminal node. 

A given function : A K — > A of X needs to be obtained at the terminal node t for each k at the 
best possible rate. A computation schema for is given and represented by a directed tree Q = (fi, T) 
where f2 is the set of nodes and T is the set of edges. The elements of f2 are labelled fa, fa, ■ ■ ■ , fan\ 
where fa,fa,...,fa are the source nodes, is the terminal node that obtains and the rest are 
computing nodes that compute different functions of X. Further, the nodes in f2 are labelled according 
to a topological order such that for i > j there is no directed path in Q from fa to fa. The source nodes 
have in-degree zero and out-degree one and the terminal node has in-degree one and out-degree zero. All 
other nodes have in-degree greater than one and out-degree exactly one. Similarly, the elements of T are 
labelled 9\, #2, ... , #iri with 9\, #2, . . . , 9 K being the outgoing edges from fa, fa, . . . , fi R respectively, and 
6\t\ being the incoming edge into fj,\a\. The remaining edges are labeled according to a topological order, 
i.e., for any i < j, there is no path from the head node of j to the tail node of i. The nodes and edges 
of Q can be labeled as above in 0(\T\) = O(k) time. 

For any edge 9 ET, let tail (9) and head(9) represent, respectively, the tail and the head nodes of the 
edge 9. Let <&-[-(#) and denote, respectively, the predecessors and the successors of 9, i.e., 

$ t (0) = { V E T\head(r]) = tail (9)} and 

$ ; (0) = { V E T\tail{r)) = head(9)}. 

Each edge 9 of Q represents a distinct function of X that can be computed from the functions corresponding 
to the edges in 3>f(#). Further, each function takes values from the same alphabet A. (We remark here 
that this is not unreasonable even when all the computations are over real numbers because computations 
are performed using a fixed precision.) 

Let N(v) = {u E V\uv E E} denote the set of neighbors of a node v E V. We also denote the set of 
neighbors and itself by N'(v) = N(v) U {v}. A sequence of nodes v±, v 2 , ■ ■ ■ ,vi, I > 1, is called a path 
if ViV i+ i E E for i — 1,2, ... ,1 — 1. The set of all paths in J\f is denoted by V . With abuse of notation, 
for such a path P, we will say Vi E P and also E P. The nodes V\ and vi are called respectively 

the start node and the end node of P, and are denoted as start(P) and end(P). 
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As discussed in Sec. HI a function with a given computation tree can be computed along any "embedding" 
of the tree in the network as shown in Fig. [TJ We are now ready to formally define an embedding of a 
computation tree. 

Definition: An embedding is a mapping B : Y — >• V such that 

1) start(S(^)) = si for I = 1, 2, . . . , K 

2) end(5(//)) = start(5(0)) if 77 e $ t (0) 

3) end(£(0|r|)) = t. 

We denote the set of embeddings of Q in J\f by B. Our aim is to determine the flows on these embeddings 
so as to maximize the total flow. An edge in the network may carry different functions of the source data 
in an embedding. We thus define the number of times an edge e G E is used in an embedding B as 
r B (e) = \{0 G r|e is a part of B(9)}\. Note that |r B (e)| < |T| for any edge, and r B (e) = for an edge 
e which is not used by the embedding B. Further, an edge may also be used to carry flows on different 
embeddings. Therefore in an assignment of flows on different embeddings, i.e., in a particular timesharing 
scheme, the edge may carry multiple types of data (i.e., different functions of X) of different amounts. 



In this section, we present our main contributions. 

• In Section IIII-AL we give a basic linear program, the Embedding-Edge LP, which characterizes our 



• In Section IIII-BL we give an alternate LP, the Node-Arc LP, that can be solved in polynomial time. 
We then present an algorithm which obtains a solution of the Embedding-Edge LP with the same 
rate from a solution of the Node-Arc LP. 

• Drawing parallels from multi-commodity flow techniques, we give, in Section IIII-CL the dual of our 
Embedding-Edge LP and present a fast primal-dual algorithm to compute an e-approximate solution. 
This algorithm needs a subroutine which finds a 'minimum weight embedding' of the computation 
tree in the network for given edge-weights. We present an efficient exact algorithm for this purpose. 
This algorithm is of independent interest, for instance, for computing functions over a network with 
power limited, but with infinite bandwidth, links. 

Note that, if start(.B(0j)) = end(5(^)), i.e., if B(9i) consists of a single node, then in that embedding 
the data 9{ is generated as well as used (i.e., not forwarded to another node) in that node. 

A. The Embedding-Edge LP 

As discussed in Sec. U and Sec. HH the function for a particular sample of the data can be computed 
over the network using any embedding of the computation tree in the network. Let B be the set of all 
embeddings of Q in M . For any embedding B G B, let x(B) denote the average number of function 
symbols computed using the embedding B per use of the network. We present below a linear program 
to maximize the computation rate A = ^2 B&B x{B). Recall that r B (e) represents the number of times the 
edge e is used in the embedding B. 



Embedding-Edge LP: Maximize A = ^2 B( z B x(B) subject to 
1. Capacity constraints 



III. Linear programs and algorithms 



problem. 




(1) 



BeB 



2. Non-negativity constraints 



x(B) > 0, V£ 



(2) 
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Fig. 2. The aggregate edge-flow values for a flow of 0.5 on the embedding in Fig. |l(c)| and a flow of 1 on the embedding in Fig. |l(b)[ 

This LP finds an optimal fractional packing of the embeddings of Q into M. Similar formulations have 
been considered widely in literature in the context of multi-commodity flow 0, ll22l and other packing 
problems 0. 

In multi-commodity flow problems, a solution of the so called Path-edge LP readily gives a way of 
achieving the corresponding rates. However, since in our problem, the data is to be mixed according 
to different embeddings for different realizations of data, one needs to carefully device a protocol to 
schedule the computation and communication at the nodes and edges in such a way that data from 
different realizations are not mixed. Such a protocol is presented in the appendix. 

B. The Node-Arc LP 

Note that the cardinality of B can be exponential in \V\. Hence the complexity of the Embedding- 
Edge LP is exponential in the network parameters if any other structure of the problem is not used. 
In the multi-commodity flow literature, another LP formulation, called the Node-Arc LP, based on the 
flow conservation principle is well-known which can be solved in polynomial time. In the following, we 
formulate a node conservation based LP for our problem. For this LP, we assume that each node in the 
network has a virtual self-loop of infinite capacity. The data flowing in the self-loop represents the data 
generated at that node. This may be the source data generated at the sources or the intermediate or final 
values computed at the node. For example, if a node computes X\X 2 from X 1 and X 2 it receives, and then 
computes X\X 2 + X 3 by using the computed X X X 2 and received X 3 , then both XiX 2 and X\X 2 + X 3 
will be assumed to be flowing in its self-loop. Example of the flows on the edges and the self-loops 
corresponding to a particular flow assignment on two embeddings is shown in Fig. [2l 

The variables in our Node-Arc LP are 

{fL fl\uv e E,e e r} u {f uu \u e V,0 e r} u {A}. 

where, f uv represents the flow of type 9 e V flowing through the edge uv E E from u to v, j 6 uu denotes 
the flow of type 9 flowing in the self-loop at u and A represents the total rate of the function computation. 
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The linear program consists of capacity constraint on the edges of J\f, a flow-conservation rule on the 
nodes of J\f, and non-negativity constraint on the flows f^ v . The flow conservation rule is based on the 
fact that an intermediate node in J\f can, apart from forwarding the flows it receives, generate a flow of 
type 9 on its self-loop by terminating the same amount of incoming flows of type 7] G &^(9). Each source 
node si, in addition, generates A amount of flow of type 9[. Similarly, the terminal node t terminates A 
amount of flow of type 9\r\- The Node-Arc LP is as follows. Recall that N'(v) denotes the set of the 
neighbors of v and itself. 



Node-Arc LP: Maximize A subject to following constraints any node v G V 

1. Functional conservation of flows: 

fl+ E E A = °« Wer\{0,r|}andV»/e^(0). 

ueN(v) u£N'(v) 

2. Conservation and termination of 9\r\: 



(3) 



EAr\ _ Ar 
Jvu / j Juv 

u£N(v) u£N'(v) 

3. Generation of 9 { V/ G {1, 2, ... , k}\ 



-A v = t 
0. otherwise 



(4) 



J V 



A v = si 
0. otherwise 



4. Capacity constraints 



5. Non-negativity constraints 



E(/l + /l) < C M, VW G E. 



(5) 



(6) 



fL > O^uv G E and G T 
f uu > 0,Vm eVandV^r 
A > 0. 



(7) 
(8) 
(9) 



This LP has O(nm) number of variables, O(m) number of non-negativity constraints (one for each 
variable), and 0{kti + m) number of other constraints. Hence it can be solved in polynomial time. 

The above LP gives a set of flow values on each link. Now we briefly describe and present an algorithm, 
Algorithm [H which, from any feasible solution of this LP, obtains a corresponding feasible solution for 
the Embedding-Edge LP that achieves the same A. 

Each iteration of the while loop finds an embedding with a non-zero flow and removes the corresponding 
edge-flows to obtain another feasible solution with a reduced rate. This continues until A amount of flow 
has been extracted. The i-th iteration of the for loop finds the mapping of 9,i in the embedding. While 
exploring the nodes to find the mapping of 9i, it checks for the presence of a cycle of flow of type 0$. It 
removes such a cycle if detected. 

Proof of correctness of Algorithm^ The proof of the following statements ensures the correctness of 
the algorithm. 

1) In the third line inside the for loop, there exists a u G N'{y) such that f^ v > 0. 

2) If a cycle of redundant flow is found and removed in the first if block inside the for loop, then the 
remaining flows still satisfy the constraints in the LP with A replaced by A — A'. 
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Algorithm 1: Finding equivalent solution of the Embedding-Edge LP from a feasible solution of the 

Node-Arc LP. 

input : Network graph J\f = (V, E), capacities c(e), set of source nodes S, terminal node t, 

computation tree Q = (Q, T), and a feasible solution to its Node-Arc LP that consists of the 
values of A, f uv V0 G T, Vuv G E, and f uu V0 G T, Wu G V. 
output: Solution {x(B)\B G B} to the Embedding-Edge LP with J2bgb x (-^) = ^- 

Initialize x{B) := 0, £(0 4 ) = (the null sequence), Vfi G B and W t G T, A' = 
while A' 7^ A do 
:= A ; 

£(0|r|) ; 
for i ;= |T| to 1 do 

f := B(6i) ; // valid, as B(9j) has of only one node at this step 

u := an element in N'(v) such that > ; 
if u^u and m G -B(0j) then 

// A cycle of redundant flow found: remove the flow from all 

the edges in the cycle 
Let P be the path in B{9i) upto the first appearance of u in it.; 
Delete P from B(0 4 ). ; 
j/ := mhweMuP (A') ; 

:= /«v " 2/ VuV G {uv} U P 

end 
else 

z(u) := min ; 

end 

if u 7^ v then 

Prefix m in B(9j) ; 
i> := u ; 

Jump to the second statement inside the for loop ; 

end 
else 

| B{rf) :=u,V77G$ t (^) ; 
end 

end 

x(B) := z(si) ; // Flow extracted on B 

A' := A' + x{B) ; // Total flow extracted 

// Remove x(B) amount of flow from all the edges in B. 
/«v : = /uv - and VuV G 5(0) ; 

// Remove x(B) amount of flow from all the relevant self-loops. 
/!< := /I' - V0 G T and i/ = start(B(0)) ; 
end 
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3) At the end of each iteration of the while loop, the remaining flows still satisfy the constraints in 
the LP with A replaced by A — A'. 

4) The algorithm terminates in finite time. 

We now outline a proof of each of these statements. We prove the statements l)-3) for a certain iteration 
of the loops while assuming that all the above claims are true in all the previous iterations of the while 
and for loops. 

Proof of\l} The current values of the flows satisfy all the constraints in the Node-Arc LP with A replaced 
by A — A'. The algorithm ensures that in this step, the total outgoing flow J2 u eN(v) fvu — z ( v ) > 0- S°> by 
constraints © and ©, the total of incoming and generated flows Y^ueN'M fuv > 0- Hence the statement 
follows. 

Proof of\2\ We will prove that a cyclic flow on a cycle Vi, v 2 , ■ ■ ■ , i>j,i>i satisfies all the constraints in 
the Node-Arc LP with A = 0. Then clearly after subtracting this flow from the edges of the cycle, the 
remaining flows in the network will still satisfy the constraints with the same A as before. For a cyclic 
flow of type 6 of volume y, the flow values are f„ iVi+1 = y for i = 1, 2, • • • , I — 1, f$ m = y, and all other 
flow values are equal to 0. So, for any node, any nonzero incoming flow is 'compensated' by the same 
amount of outgoing flow of the same type. All flow values in the self-loops are also 0. So clearly these 
flows satisfy the constraints in the LP with A = 0. This completes the proof. 

Proof of\3} Again, we will prove that the removed x(B) amount of flows on the edges of an embedding 
and on the self-loops themselves satisfy the constraints in the LP with A = x(B). Then the remaining flows 
will also satisfy the constraints with A replaced by A — x(B). The subtracted flow values are f^ v = x(B) 
for uv E B(9), f® u = x(B) for u = start(£>(6*)), and all other flow values 0. We can verify that these 
flows satisfy the constraints in the Node-Arc LP. 

Proof of '0' The Node-Arc LP has 0(m|r|) number of variables f® v and f® u . Each deletion of flows 
through a cycle, or through an embedding, makes at least one of these variables zero. Since the number 
of steps in each iteration is finite, the algorithm ends in finite time. ■ 

It can be checked that the overall complexity of Algorithm \T\ is 0(K 2 m 2 ). 

C. Primal-dual algorithm and min-cost embedding 

The Node-Arc LP and the subsequent algorithm to find an optimal solution of the Embedding-Edge LP 
has polynomial-time complexity. For the multi-commodity flow problem, and for more general packing 
problems, Garg and Konemann gave a faster primal-dual algorithm to find an e-approximate solution. 
The algorithm uses a hypothetical subroutine/oracle. For the multi-commodity flow problem, the subroutine 
finds the shortest paths between the source-terminal pairs. We now give a similar fast algorithm to find 
an e-approximate solution to the Embedding-Edge LP. 

We first provide the dual of the Embedding-Edge LP. The dual has the variables L = (Z(e)) eG e 
corresponding to the capacity constraints in the primal. The dual LP is given as follows. 



Dual of Embedding-Edge LP: Minimize D(L) = J2 e eE c ( e )K e ) subject to 
1. Constraints corresponding to each x(B) in primal: 




(10) 



2. Non-negativity constraints: 



/(e) > 0, Ve e E 



(11) 



We define the weight of an embedding B as 



w L (B) = J2rB(e)l(e). 
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It can be checked (similar to [0) that the dual LP is equivalent to finding minx, ~^~> where 

ai = mm wi(B) 

B 

is the cost of the minimum cost embedding for L. 

The Embedding-Edge LP is a fractional packing LP of the type considered by Garg and Konemann [|2l 
and Plotkin et al. [|23l . A polynomial time primal-dual algorithm was presented in [|2l for such LPs 
assuming the existence of an efficient oracle subroutine which finds a 'shortest path.' For a packing LP 
max {a T x \ Ax < b, x > 0} and its dual LP min [b T y\A T y > a, y > 0}, the shortest path is defined as 
A(i, j)y(i)/a(j) (2J. It is easy to see that for our LP, the 'shortest path' corresponds to the embedding 
with minimum weight, argmhiB wl(B). Algorithm |2] gives the instance of the primal-dual algorithm for 
our problem. 



Algorithm 2: Algorithm for finding approximately optimal x and A 

input : Network graph J\f = (V, E), capacities c(e), set of source nodes S, terminal node t, 

computation tree Q = the desired accuracy e 

output: Primal solution {x(B),B E £>} 

Initialize /(e) := S/c{e), Ve E E,x{B) := 0,\/B E B ; 
while D(l) < 1 do 

B* := OptimalEmbedding(L) ; // Opt imalEmbedding (L) outputs argming u>x,(.B) 

e* := edge in B* with smallest c(e) /r B *(e) ; 

x(B*) := x(B*) + c(e*)/r B *(e*) ; 

Ke) := Ke)(l + e ^;}ff ), Ve E B* ; 

end 

x(B) := x(B)/log 1+e i±^yB ; 



We now describe, and then provide below, the subroutine OptimalEmbedding(L) which finds a minimum 
weight embedding of Q on J\f with a given length function L. For each edge 8i, starting from 9\, it finds 
a way to compute ^ at each network node at the minimum cost possible. It keeps track of that minimum 
cost and also the 'predecessor' node from where it receives 0j. If 0j is computed at that node itself then 
the predecessor node is itself. This is done for each 6>j by a technique similar to the Dijkstra's algorithm. 
Computing 9i for i E {1,2, ... ,k} at the minimum cost at a node u is equivalent to finding the shortest 
path to u from Sj. We do this by using Dijkstra's algorithm. For any other i, the node u can either 
compute 6i from or receive it from one of its neighbors. To take this into account, unlike Dijkstra's 

algorithm, we initialize the cost of computing 9i with the cost of computing $t(#j) at the same node. With 
this initialization, the same principle of greedy node selection and cost update as in Dijkstra's algorithm 
is used to find the optimal way of obtaining Qi at all the nodes. Finally, the optimal embedding is obtained 
by backtracking the predecessors. Starting from t, we backtrack using predecessors from which 0i r i is 
obtained, till we hit a node whose predecessor is itself. This node is the start node of -B(#iri) and the end 
node of B(rj) for all rj E &^(9\r\). The complete embedding is obtained by continuing this process for 
each 9i in the reverse topological order. 

Correctness of OptimalEmbedding(L): It is sufficient to show that, during each phase i, the algorithm 
computes optimal values for co u [9i) and a u (9i), for each node u in N . We prove this by induction on 
the pair (i, according to the lexicographic ordering. For i E {1, . . . , n} and for all |^|, this follows 
from the correctness of Dijkstra's algorithm. Now, assuming the optimality of cu u (9i) and a u (9i) till all 
iterations before (i, \^\), we prove the statement for (i, Suppose v is the element added to ^ in the 
current iteration. We consider two cases: 

Case 1: ^ = {v}: The cost of computing (and not receiving from another node) 9i at any node u 
is J^rje* (0-) u ™ (^)- T ne algorithm chooses v which has the minimum ^„ e$ < e .\W u (rj) among all nodes 
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Procedure OptimalEmbedding(L) 
input : Network graph J\f = (V, E), Length function L, set of source nodes S, terminal node t, 

computation tree Q = (fi, T). 
output: Embedding B* with minimum weight under L 

for i '. — 1 to |T| do 

if z G {1,2, ... , k} then 

uj u (9i) := oo,\/u eV - {s^ ; 
uj Si (6i) := and a Si (0<) := s { ; 

end 
else 

u u (6i) := E,e* t (tf<) w « (??) , Vu G K ; 
ct u := «, V« 6 y ; 

end 

* : = 0; * ._ y ; 
while < n do 

v := arg min ug ^ u u (0 f ) ; 
* := \P U {f} ; 
i|r : = ijr _ ; 

foreach u G iV(i;) do 

if oj v (9i) + < u u (9i) then w u := cj^ (0j) + /(ww) and er„ (0j) := v ; 

end 

end 

end 

B*(9\ T \) := t ; 
for % = |T| to 1 do 

u := B*(9i) ; // valid, as B*(9i) consists of only a node at this step 
while a u {9i) ^ u do 

Prefix a u (9i) to ; 

u := cr u (6»j) ; 

end 

B( V ) '.= u V?7 G $ t (^) ; 

end 



116 V and assigns w„ = E^e* (00 ^ (^) an ^ °w (^») = 77 • ^ these are not optimal, then it must be more 
efficient for v to receive 9{ which is computed at some other node u. But that implies Ejje$ w « l 7 ?) < 
Er?e* (Si) ^ C 7 ?)' w hich is a contradiction to the choice of t>. 

Case 2: {u} C \1>: Suppose there is a more efficient way of receiving 9i at v than from the node selected 
as a v (9i) and that is to compute 9i at a node u and receive it along a path P u v . Let the corresponding 

cost be u)' v (9i). First, if u G then the present cost ^< X^e*^) w « ( r7 )) at M * s ^ ess man m e present 
value of co v (0j), which is a contradiction to the choice of v. Thus Let u' be the last node in P u v 

from ty, and i>' be the first node in P UiV from ty'. Then > ui u i (9i) + l(u'v') > ui v > (9i) > u> v (9i) 

— a contradiction. Here the first inequality follows since u' G ^f. The second inequality follows from the 
update rule followed during the inclusion of u' in \P. The last inequality follows from the choice of v. 

Complexity of OptimalEmbedding( L ) and the primal-dual algorithm: Let us consider the first for loop 
in OptimalEmbedding(L). Each iteration of this loop is the same as Dijkstra's algorithm except for the 
initialization. Thus, the for loop, excluding the initialization step, can be run in 0(m + nlogn) time using 



1 1 



Fibonacci heap implementation. The initialization step requires 0(n\^(9i)\) time for each iteration. The 
second for loop has 0(nn) complexity. So the overall algorithm takes 0(n(m + nlogn)) time. 

The number of iterations in the primal-dual algorithm is of the order 0(e~ 1 m\og 1+e (m)). Thus the 
overall complexity of the algorithm is O (e~ 1 Km(m + nlogn) log 1+e (m)). 

IV. Extensions 

1. Multiple trees for the same function: It may be possible to compute a function in different sequences 
of operations which are expressed by different computation trees. For example, the 'sum' function 
f(Xi, X 2 , Xs) = X\ +X 2 + X 3 may be computed by any of the computation sequences ((Xi+X 2 ) + X 3 ) , 
(Xl + (X 2 + X 3 )) , or (X 2 + (Xi+Xsj) . In general, suppose multiple computation trees Qi, Q 2 , ■ ■ ■ , Q v are 
given for computing the same function. Let Bi denote the set of all embeddings of Qi for % = 1, 2, . . . , v. 
Let B = UiBi denote the set of all embeddings. Under this definition of B, the Embedding-Edge LP for 
this problem is the same as that for a single tree. The new OptimalEmbedding(L) algorithm finds an 
optimal embedding for each Q, t and chooses the one with minimum weight as the optimal embedding in 
B. This can be used in the same primal-dual algorithm to find an e-approximate solution. 

Some edges of different trees may represent an identical function of the sources. For example, for the 
function X\ + X 2 + X 3 + X 4 , an edge corresponding to the function X 1 +X 2 is present in each of the trees 
corresponding to ( + X 2 ) + X 3 ) + X 4 J , + X 2 ) + {X 3 + X 4 )) , and f ((Xj + X 2 ) + X A ) + X 3 \ . 
For this reason, OptimalEmbedding(L) algorithm can be made more efficient by running iterations for 
each function rather than each edge. The initialization of cu u (9) changes correspondingly, to take into 
account all possible ways of computing that function. Rest of the algorithm remains the same. 

The particular function 6(Xl, X 2 , . . . , X K ) = X\ + X 2 + . . . + X K is of special theoretical as well as 
practical interest. There are many, of the order of k\, sequences of additions of data and corresponding trees 
to get this function. With the above modification, our OptimalEmbedding(L) algorithm has complexity 
exponential in k and linear in m. As a result, our primal-dual algorithm gives an e-approximate solution 
in exponential complexity in k and quadratic in m. The problem is equivalent to the much investigated 
multicast problem. For this problem, and consequently for the function 'sum', the oracle finds a minimum 
weight Steiner tree. This is well-known to be NP-hard on k. Approximate (but not e-approximate for 
any given e) polynomial complexity algorithms are known (see Il24l and citations therein) for finding a 
minimum weight Steiner tree. This can also be used to find approximate solution to the multicast, and 
hence the 'sum', in polynomial complexity Il24l . 

2. Multiple functions and multiple terminals: Suppose the network has multiple terminals t±, t 2 , ■ ■ ■ , f 7 

wanting functions Gi(X^)), Q 2 (X^), . . . , 9 7 (X^ 7 )) respectively. Here X^> is the data generated by a 
set of sources S^'. The sets S^;i = 1,2, ... ,7 are assumed to be pairwise disjoint. For each function 
Oj, a computation tree Qi is given. Let us consider the problem of communicating the functions to the 
respective terminals at rates Ai, A 2 , . . . , A 7 . The problem is to determine the achievable rate region which 
is defined as the set of r = (Ai, \ 2 , . . . , A 7 ) for which a protocol exists for transmission of the functions 
at these rates. This region can be approximately found by solving either of the following problems. 

(i) For any given non-negative weights a±, a 2 , . . . , a 7 , what is the maximum achievable weighted sum- 
rate EJ=i V 

For this problem, we consider embeddings of the computation trees Qi into the network for each terminal 
U. Let Bi denote the set of all embeddings of Qi. Then the Embedding-Edge LP for this problem is to 
maximize J21=i a i Y^b&b x {B). The constraints are the same as before with B defined by B = UiBi. 
The weight of an embedding B G B under a weight function L is defined as aiWi{B) if B e Bi. The 
new OptimalEmbedding(L) algorithm finds an optimal embedding for each Qi and chooses the one with 
minimum weight. This can be used in the same primal-dual algorithm to find an e-approximate solution. It 
is also easy to obtain a Node-Arc LP for this problem by minor modifications to that for a single function 
computation at a single terminal. 



12 



(ii) For any non-negative demands cti, a 2} ... , a 7 , what is the maximum A for which the rates A«i, Aa 2 , . . . , 
are concurrently achievable? 

Here, we define an embedding to be a tuple B = (B 1 , B 2 , . . . , £> 7 ), where Bi G Bi is an embedding 
of the computation tree Q{. The Embedding-Edge LP for this problem is the same as that for the single 
terminal problem with r B (e) defined as r B (e) = YH=i a i\{® e ^i\ e is a °f -^Wll an d B — Bi x 
£> 2 x . . . x B T The weight of an embedding B under a weight function L is defined as Yll=i a i w L{Bi). 
The new OptimalEmbedding(L) algorithm finds an optimal embedding B by separately finding optimal 
embeddings Bi for each Qi. This can be used in the same primal-dual algorithm to find an e-approximate 
solution. Again, we can easily obtain a Node-Arc LP by minor modification to that for a single function 
computation at a single terminal. 

3. Computing with a precision: In practice, the source data may be real-valued, and communicating 
such a data requires infinite capacity. In such applications, it is common to require a quantized value 
of the function at the terminal with a desired precision. This may, in turn, be achieved by quantizing 
various data types with pre-decided precisions and thus different data type may require different number 
of bits to represent them. Suppose the data type denoted by 9 is represented using b{9) bits. Then the 
Embedding-Edge LP and its dual for this problem are the same as before except that the definition of 
r B {e) is changed to r B (e) = ^2 SeT . eis apazt0 f B {6) K^)- ^ n me Node-Arc LP, the capacity constraints are 
changed to 

+ m<c(uv), VuveE. 

In the OptimalEmbedding(L) algorithm, l{uv) is replaced by l{uv)b{9i) inside the foreach loop. 

4. Energy limitted sensors: Suppose, instead of capacity constraints on the links, each node u E V has 
a total energy E{u). Each transmission and reception of require the energy Etm and E Bt g respectively. 
Generation of one symbol of 9 or computation of one symbol of 9 from $1(6*) requires the energy Ec,e- 
The objective is to compute the function at the terminal maximum number of times with the given total 
node energy at each node. 

For an embedding B, if B{9) = v\, v 2 , • ■ ■ , vu then tr(B(9)) = v 2 , ■ ■ ■ , ty-i} denotes the transmit- 
ting nodes, and rx(B(9)) = {v 2 , V3, ■ ■ ■ ,vi} denotes the receiving nodes of 9. If I = 1, then tr(B(9)) = 
rx(B{9)) = 0. For B, the energy load on the node u is given by 

Eb(u)= E c,e+ Yl Et > 9+ Yl Er ' 6 - 

0:start(B(0))=u B:u&x(B{6)) 9:u&rx(B(e)) 

The capacity constraint in the Embedding-Edge LP is replaced by the energy constraint on the nodes 

J2x(B)E B (u) < E(u) Vw G V, 
BeB 

where an empty sum is defined to be 0. The dual of the Embedding-Edge LP is: Minimize D(L) = 
Euev E(u)l(u) subject to 

1. Constraints corresponding to each x(B) in primal: 

^2e b (u)1(u)>1,VB (12) 

u&B 

2. Non-negativity constraints: 

l(u) > 0, Vu e V. (13) 
The weight or cost of an embedding can be defined as 

w L (B) = Y,E B (u)l(u). 

u&B 
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The OptimalEmbedding(L) is modified in the weight initialization and weight update. The weight initial- 
ization is done as u Si (6>j) := Ec t e { for source data and cu u (6>j) := E C)d . + Y^rie® w « fa) f° r omer data. 
The weight update at it is now done as u u (0j) := u v (9i) + E Tfil + E RA if cu v (9i) + E TA + E RA < u u (0 4 ). 
After suitable modification, the primal-dual algorithm with the modified OptimalEmbedding(L) algorithm 
finds an e-approximate solution. 

In the Node-Arc LP, the capacity constraints are replaced by energy constraints at the nodes: 

fu U E Cfi + ( A^r.' + flE R>e ) < E(u) \/u e V. 



V. Discussion and conclusion 

In this paper, we have laid the foundations for network flow techniques for distributed function com- 
putation. Though we have obtained results for computation trees, we believe that much of our techniques 
can be extended to larger classes of functions, for instance, fast Fourier transform (FFT), that can be 
represented by more general graphical structures like directed acyclic graphs and hypergraphs where each 
edge or hyper-edge represents a distinct function of the sources. The sum function discussed in Sec. [IV] 
is one such function representable by a hypergraph. 

Our computation framework does not allow block coding, i.e., coding across different realizations of 
the data. Such coding has been used in the information theory and network coding literature. Block 
coding can, in general, offer better computation rate. For example, consider the directed butterfly network 
as shown in Fig. [3] with two binary source nodes (with source processes denoted by X and Y) and a 
terminal node with a XOR target function Q(X, Y) — X © Y. It can be checked that the maximum rate 
achievable by routing-like schemes, i.e., without using inter-realization coding, is 1.5. On the other hand, 
the scheme shown in Fig. |3(b)| using inter-realization coding achieves a rate of 2. However, for more 
general functions, finding the optimal rate and designing optimal coding schemes is a difficult problem 
under this framework. Further, for undirected multicast networks, it is known that the inter-realization 
coding can achieve a rate strictly less than twice the rate achieved by routing If25i We expect that similar 
results will hold for function computation over undirected networks. 

Altogether, we believe that results in this paper opens many new avenues for further research. 




X®Y X 2 ©Y2 

(a) The butterfly network. Each (b) A rate-2 solution using 
edge has capacity 1 bit/use cross-realization coding 



Fig. 3. The butterfly network with XOR target function 0(X, Y) = X Y 



14 



VI. Acknowledgement 

The authors would like to thank A. Diwan for fruitful discussions. This work was supported in part 
by Bharti Centre for Communication at IIT Bombay and a project from the Department of Science and 
Technology (DST), India. 

References 

[1] R. K. Ahuja, T. L. Magnanti, and J. B. Oiiin, Network Flows. Prentic Hall Inc, 1993. 

[2] N. Garg and J. Konemann, "Faster and simpler algorithms for multicommodity flow and other fractional packing problems," In Proc. 
FOCS, 1998. 

[3] T. Leighton, F. Makedon, S. Plotkin, C. Stein, S. Tragoudas, and E. Tardos, "Fast approximation algo- rithms for multicommodity flow 

problems," J. Comput. System 5c/., vol. 50, pp. 228-243, 1995. 
[4] F. Shahrokhi and D. Matula, "The maximum concur- rent flow problem," J. ACM,, vol. 37, pp. 318334, 1990. 

[5] R. G. Gallager, "Finding parity in simple broadcast networks," IEEE Transactions on Information Theory, vol. 34, pp. 176-180, 1988. 
[6] E. Kushilevitz and Y. Mansour, "Computation in noisy radio networks," in Proceedings of the 9th annual ACM-S1AM Symposium on 

Discrete Algorithms, 1998, pp. 236-243. 
[7] U. Feige and J. Kilian, "Finding or in noisy broadcast network," Information Processing Letters, vol. 73, no. 1-2, pp. 69-75, January 

2000. 

[8] A. Giridhar and P. R. Kumar, "Computing and communicating functions over sensor networks," IEEE Journal on Selected Areas in 

Communications, vol. 23, no. 4, pp. 755-764, April 2005. 
[9] L. Ying, R. Srikant, and G. Dullerud, "Distributed symmetric function computation in noisy wireless sensor networks with binary data," 

in Proc. of the 4th International Symposium on Modeling and Optimization in Mobile, Ad-Hoc and Wireless networks (WiOpt), April 

2006, pp. 1-9. 

[10] Y. Kanoria and D. Manjunath, "On distributed computation in noisy random planar networks," in Proceedings of IEEE International 

Symposium on Information Theory, Nice, France, June 2007. 
[11] S. Kamath and D. Manjunath, "On distributed function computation in structure-free random networks," in Proceedings of IEEE 

International Symposium on Information Theory, Toronto, Canada, July 2008. 
[12] J. Korner and K. Marton. How to encode the modulo-two sum of binary sources. IEEE Trans. Inform. Theory, 25(2):219-221, 1979. 
[13] T. S. Han and K. Kobayashi. A dichotomy of functions f(x,y) of correlated sources (x,y). IEEE Trans. Inform. Theory, 33(l):69-86, 

1987. 

[14] Alon Orlitsky and J. R. Roche. Coding for computing. IEEE Trans. Inform. Theory, 47(3):903-917, 2001. 

[15] H. Feng, M. Effros, and S. A. Savari. Functional source coding for networks with receiver side information. In Proceedings of the 

Allerton Conference on Communication, Control, and Computing, September 2004. 
[16] B. K. Rai and B. K. Dey, "Sum-networks: system of polynomial equations, reversibility, insufficiency of linear network coding, 

unachievability of coding capacity," Submitted to IEEE Trans. Inform. Th., available at http://arxiv.org/abs/0906.0695. 
[17] R. Appuswamy, M. Franceschetti, N. Karamchandani, and K. Zeger, "Network coding for computing part i : Cut-set bounds," Submitted 

to IEEE Trans. Inform. Th., available at http://arxiv.org/abs/0912.2820. 
[18] M. Langberg and A. Ramamoorthy, "Communicating the sum of sources in a 3-sources/3-terminals network," in Proceedings of IEEE 

International Symposium on Information Theory, (Seoul, Korea), 2009. 
[19] F. T. Leighton, M. J. Newman, A. G. Ranade, and E. J. Schwabe, "Dynamic tree embeddings in butterflies and hypercubes," SIAM 

Journal on Computing, vol. 21, no. 4, pp. 639-654, 1992. 
[20] O. Wohlmuth and F. Mayer-Lindenberg, "A method for them embedding of arbitrary communication topologies into configurable 

parallel computers," in Proceedings of the 1998 ACM Symposium on Applied Computing, 1998, pp. 569-574. 
[21] V. Heun and E. W. Mayr, "Efficient dynamic embeddings of arbitrary binary trees into hypercubes," Journal of Algorithms, vol. 43, 

pp. 51-84, 2002. 

[22] G. Karakostas, "Faster approximation schemes for fractional multicommodity flow problems," ACM Trans. Algorithms, vol. 4, 2008, 
pp. 1-17. 

[23] S. Plotkin and D. Shmoys and E. Tardos, "Fast approximation algorithms for fractional packing and covering problems," Math. Oper. 
Res., vol. 20, pp. 257-301, 1995. 

[24] M. Saad and T. Terlaky and A. Vannelli and H. Zhang, "Packing trees in communication networks," J. Comb. Optim., vol. 16, pp. 402- 
423, 2008. 

[25] Z. Li and B. Li, "Network coding in undirected networks," Proc. 38th CISS, Princeton, NJ, Mar. 2004, pp. 257-262. 

Appendix A 
The protocol 

We now outline a communication and computation protocol designed to receive the function at the 
terminal at a rate that is greater than J2b&b x (B) — e for any given solution of the Embedding-Edge LP. 
First, the flow values x(B) are rounded to lower rational numbers so that the total flow r is still greater 
than ^ BeB %{B) — e. With abuse of notation, we use the same notation x(B) to denote these rounded 
values of x (B) in the rest of this subsection. All these flows are then multiplied by the least common 
multiple N of the denominators of the flows x(B); B E B. Let the resulting values be n(B); B G B. 
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Clearly J2 BeB n {B) = rN. Let us fix an order in the embeddings B±, B 2 , . . . , Bm\. The protocol consists 
of computation at the nodes and communication across the links in a block/frame of N consecutive uses 
of the network. In each frame, a link e can carry upto a total of Nc(e) symbols in both directions. Our 
protocol will require sending integer number of symbols in N uses of e in each direction. We assume that 
this is possible as long as the total number of symbols transmitted in both directions is at most Nc(e). 
We assume that computation at nodes is done instantaneously, and a frame sent across a link is available 
at the receiving node at the end of the frame. The receiving node can forward the data on another edge 
in the next frame or use it to compute something else for transmission in the next or later frames. 

In our protocol, the data stream generated at each source is divided into blocks of rN symbols, and 
the terminal computes rN number of corresponding function values in each frame. Out of the rN 
computations, the first n(Bi) are carried out using the embedding B\, the next n(B 2 ) are carried out 
using the embedding B 2 , and so on. In each direction on each link, the transmissions corresponding to 
different embeddings are ordered in the same order as the embeddings. Further, if uv is in B(9i) as well 
as B(9j) (assume i < j without loss of generality), then uv carries the data for (B,9i) first and then 
the data for (B,9j). Formally, in each frame and in each direction, a link uv in J\f carries a subframe, 
possibly empty, of data for each (B, 9) pair, where B E B,9 E T. These subframes are transmitted in 
the lexicographic order on (B,9). Since the subframes for different (B,9) may be available at u with 
different delay, these subframes will not correspond to the same frame of source data. In the following, 
we explicitly describe the subframes carried by uv in the k-th frame. 

Let y^ q denote the n(B) symbols of data of type 9 corresponding to the n(B) symbols of source data 
in the A;-th frame corresponding to the embedding B. That is, g denotes the n(B\) symbols of data of 
type 9 corresponding to the first n{Bi) symbols of source data in the A;-th frame, y k B2 g denotes the n(B 2 ) 
symbols of data of type 9 corresponding to the next n(B 2 ) symbols of source data in the A;-th frame, and 
so on. In each frame, uv carries a subframe of data for each (B, 9) pair. The subframe corresponding to 
(B, 9) is empty if uv ^ B{9). Formally, 

yk = iy B ,e if uv e B(9), 
uv,B,e | ot i ierw j sei 

This subframe corresponds to the A;-th block of source data. These subframes may be available at u with 
variable delay due to variable path lengths from the sources along different embeddings. Let us define 
the depth or delay d(u, B,9) as 



d(uv, B, 



'oo if uv E' B(9) 

if uv E B(9),u = Sl ,9 = 9i 

1 + max{d(wu, B, rj)\rj E $^(9),wu E B(rj)}} 
if uv E B(9), u = start(£(0)), (u, 9) ^ (s h 9 { ) 

^d(wu,B,9) + 1 if (u,9) ^ (si,9i),wu,uv E B(9). 



(14) 



So, the subframe y^ v B q, which has n(B) symbols if uv E B(9) and which corresponds to the k-the 
frame of source data, will be transmitted in the (k + d(uv, B, 0))-th frame on uv. The infinite value for 
uv ^ B{9) indicates that the corresponding data does not flow through uv from u to v. 
Example: Consider the network and the computation tree shown in Fig. HI The edges of the computation 
tree are labeled by the functions they carry, that is, X, Y, and X+Y. For embedding B±, d(siv, B 1: X) = 0, 
d(s 2 v, Bi,Y) = 0, d(vw, B\, X + Y) — 1, d(wt, Bi, X + Y) = 2, and all other delay values are oo. For 
embedding B 2 , d(s x u, B 2 , X) = 0, d(s 2 w, B 2 , Y) = 0, d(uw, B 2 , X) = 1, d(wt, B 2 , X + Y) = 2, and all 
other delay values are oo. 

The data transmitted in the A;-th frame from u to v on the link uv, in order of transmission, is thus 

k-d{uv,B 1 fi 1 ) k-d{uv,Bi,6 2 ) k-d{uv,B 1 fi^ T \) k-d(uv,B 2 ,e 1 ) k-d(uv,B 2 ,e 2 ) ^k-d{uv,B 2 ,9\ r \) __k-d(uv,B ]l 



yuv,Bi, 



uv,B\ ,62 



uv,B 2 ,02 



yuv,B 2 , 
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y%iv B| B | 6» 2 ' • • • ' y«« b b e r . It is easy to see that the required now of function vafues wiff be 
computed on each embedding by this protocol. If the communication starts with the frame number and 
ends with the K-th frame of source data, then the subframes are empty for k < d(uv, 9j) and for 
k > K + d(uv, Bi, 9j). In particular, a subframe y^ i & V Q B% ' 9 ^ is empty if uv E~ B^Oj). 
Example: In the above example, suppose a solution of the Embedding-Edge LP is x(Bi) = 1 and 
x(B 2 ) = 0.5. Then N = 2, and n(Bi) = 2,n(B 2 ) = 1. Each data stream is divided into frames of 
3 symbols, out of which the first 2 symbols flow over B\ and the last symbol flows over B 2 . In the 
k-th frame, the link uw carries only one non-empty subframe for B 2 containing one 'X' symbol. That 
subframe y^" 1 ^ x corresponds to the last symbol of the (A; — l)-th frame of data. The link wt carries one 
subframe of two 'X + Y' symbols for B x and another subframe of one 'X + Y' symbol for B 2 . These 
subframes y^ 2 Bl x+Y'Ywt^ x+y correspond to the first two symbols of the [k — 2)-th data frame and 
the last symbol of the (k — 2)-th data frame respectively. 

To implement the protocol, any node u needs to know N, n(B) for all embeddings with non-zero 
n(B), and d(uv,B,9) and d(vu,B,9) for all such embeddings B, 9 G T,v E N(u). The values of 
d(uv,B,9) can be found in 0(nb\T\) time, where b is the number of embeddings for which n(B) > 0. 
In the following, we give the sequence of actions taken by any node u. 

1. The node maintains an input queue for each (B,9) pair for which d(vu,B,9) < oo for some 
v e N(u). 

2. For the k-th frame received from v on the link vu, the node u knows the 'composition', i.e., how 
many symbols for which (B, 9) pair are received on that frame and in what order. This is because the 
frame contains a non-empty subframe corresponding to (B, 9) if and only if d(vu, B, 9) < k. Such a non- 
empty frame contains exactly n(B) symbols. The transmission of all the non-empty frames is ordered in 
the lexicographic ordering of (B, 9). For any received frame on any link, u puts each received subframe 
in its respective input queue. If u is a source, it also takes the rN generated symbols and creates the 
subframes of lengths n(B) for all the relevant embeddings. Those are also placed in respective queues. 

3. After queueing all the received and generated data in the k-th frame, u prepares the data to be 
transmitted on each link uv in the next, that is (k + l)-th, frame of N transmissions. The non-empty 
subframes for this transmitted frame are those for which d(uv, B, 9) < k + 1. If there is an input queue 
for (B, 9), i.e., if such a data subframe is received at u, then this subframe of n(B) symbols is taken from 
the respective input queue. Otherwise, this subframe is generated from the subframes from the queues 
for (B,r));r] E 3>t(#). If such a queue for (B,rj) contains multiple subframes of n(B) symbols, then the 
oldest of them is taken. For instance, in our example (Fig. H]), for constructing the subframe y^ t Bi X+Y 
at w for the k-th frame, w takes a subframe from its input queue (B 2 ,X) and a subframe from the input 
queue (B 2 , Y) and adds them. At this time, in the first queue, there is only one subframe y^" 2 ^ x which 
is used now. But in the second queue, there are two subframes y^" 1 ^ Y ant ^ Yvw B y available, out of 
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which the older subframe y^ w 2 B2 Y is used. 



